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Abstract 

Background: The first distinct differentiation event in mammals occurs at the blastocyst stage when totipotent 
blastomeres differentiate into either pluripotent inner cell mass (ICM) or multipotent trophectoderm (TE). Here we 
determined, for the first time, global gene expression patterns in the ICM and TE isolated from bovine blastocysts. 
The ICM and TE were isolated from blastocysts harvested at day 8 after insemination by magnetic activated cell 
sorting, and cDNA sequenced using the SOLiD 4.0 system. 

Results: A total of 870 genes were differentially expressed between ICM and TE. Several genes characteristic of ICM 
(for example, NANOG, SOX2, and STAT3) and TE {ELF5, GATA3, and KRT18) in mouse and human showed similar 
patterns in bovine. Other genes, however, showed differences in expression between ICM and TE that deviates 
from the expected based on mouse and human. 

Conclusion: Analysis of gene expression indicated that differentiation of blastomeres of the morula-stage embryo 
into the ICM and TE of the blastocyst is accompanied by differences between the two cell lineages in expression of 
genes controlling metabolic processes, endocytosis, hatching from the zona pellucida, paracrine and endocrine 
signaling with the mother, and genes supporting the changes in cellular architecture, sternness, and hematopoiesis 
necessary for development of the trophoblast. 
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Background 

Following its formation by syngamy of the pronuclei of 
the oocyte and sperm, the mammalian embryo begins life 
as a totipotent, single cell organism. Subsequent cycles of 
cell division and the formation of tight junctions between 
blastomeres lead to a condition whereby blastomeres on 
the outer face of the embryo exhibit different patterns of 
cell polarity, gene expression and protein accumulation 
than blastomeres on the inner part of the embryo [1-4]. 
Non-polarized blastomeres in the inner part of the embryo 
are destined to form the pluripotent inner cell mass (ICM) 
that gives rise to the embryo while polarized cells in the 
outer face of the embryo are fated to differentiate into the 
trophectoderm (TE), which develops into extraembryonic 
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membranes. Cell fate may be determined as early as the 
4-8 cell stage in the mouse and depend upon differences 
between blastomeres in the kinetics of the interaction 
between the transcription factor Pou5fl and DNA binding 
sites [5]. Nonetheless, blastomeres do not undergo lineage 
commitment until about the 32-cell stage (in mice), based 
on loss of ability of blastomeres to form either ICM or TE 
[2]. 

Lineage commitment towards ICM or TE is under the 
control of specific transcription factors. The exact role 
of at least some transcription factors varies with species 
[6]. In the best studied species, the mouse, the ICM is 
regulated by Sall4, PouSfl, Sox2 and Nanog while TE 
formation results from a cascade of events involving 
Yapl, Tead4, Gata3, Cdx2, Eomes and ElfS [7]. Func- 
tional properties of the two cell lineages is also diver- 
gent. In part, this reflects the processes responsible for 
establishment and maintenance of cell lineage, such as 
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differences in transcription factor usage, cell signaling 
pathways and epigenetic marks [7,8]. In addition, the 
function of the ICM, which is fated to undergo a series 
of differentiation events in the gastrulation process, is 
different from the TE, which is destined to interact with 
the lining of the maternal reproductive tract. 

In the present study, we describe, for the first time, 
differences in the transcriptome of the ICM and TE with 
the objective of understanding the consequences of the 
differentiation of these two cell types for cellular func- 
tion. This was achieved by separating ICM and TE using 
a newly-developed immunomagnetic procedure [9] fol- 
lowed by next-generation sequencing. Results reveal the 
implications of the spatial and developmental differenti- 
ation of these first two lineages of the preimplantation 
embryo with respect to metabolism, interaction with the 
maternal system and changes in cellular architecture. In 
addition, aspects of molecular control of the process of 
lineage commitment and differentiation are illustrative 
of similarities and differences with the prototypical 
mouse model. 

Methods 

Reagents 

All reagents were purchased from Sigma-Aldrich (St. 
Louis, MO, USA) or Fisher Scientific (Pittsburgh, PA, 
USA) unless otherwise specified. 

Embryo culture and ICM/TE isolation 

Bovine embryos were produced from slaughterhouse- 
derived oocytes using procedures for in vitro oocyte 
maturation, fertilization, and embryo culture as 
described previously [10]. Ovaries were donated by Cen- 
tral Packing, Center Hill Florida. The day of fertilization 
was defined as Day 0. After fertilization for 18-20 h, 
embryos were cultured in SOF-BE1 medium [11] at 
38.5°C in a humidified atmosphere of 5% CO2 and 5% 
O2 with the balance N 2 . Embryos were cultured in 
groups of 30 in a 50 ul culture drop under mineral oil. 
At Day 6, an additional 5 ul culture medium was added. 
At Day 8, blastocysts were harvested and used to pre- 
pare preparations of ICM and TE using magnetic acti- 
vated cell sorting as reported previously [9]. 

Three separate pools of TE and ICM for each treat- 
ment were obtained. Each pool was prepared using 88 to 
102 blastocysts. A total of 15 fertilization procedures 
were used to prepare the blastocysts; a set of three bulls 
was used for fertilization for each procedure. 

RNA preparation, library construction and sequencing 
using SOLiD 4 system 

Total RNA was isolated from each pool of embryonic 
cells using the PicoPure RNA Isolation Kit (Applied Bio- 
systems, Foster City, CA, USA) according to the 



manufacturer's instructions. The quality of RNA was 
assessed using the Agilent 2100 Bioanalyzer (Agilent 
Technologies, Santa Clara, CA). Amplified cDNA was 
prepared from total RNA for RNA-Seq applications 
using the Ovation RNA-Seq kit (NuGen Technology, 
San Carlos, CA). Barcoded fragment libraries were con- 
structed using the SOLiD™v4 fragment library kit 
according to the manufacturer's protocol (Applied Bio- 
systems). Briefly, double stranded cDNA was sheared to 
150-180 bp fragments using a Covaris™S2 Sonication 
system (Covaris, Woburn, MA). The fragmented DNA 
was subsequently end-repaired and blunt-end ligated to 
PI and P2 adaptors. The adaptor ligated, purified and 
size-selected 200-270 bp fragments were nick-translated 
and then amplified using primers specific to PI and P2 
adaptors and Platinum" PCR Amplification Mix (Ap- 
plied Biosystems). The quality of the libraries and frag- 
ment distribution were verified by running 1 ul of each 
library on Agilent DNA 1000 chip (Agilent Technolo- 
gies). Amplified libraries (5 different libraries pooled for 
each slide) were immobilized onto SOLiD PI DNA 
beads (Applied Biosystems). The bead-bound libraries 
were then clonally amplified by emulsion PCR according 
to the Applied Biosystems SOLiD™ 4 Systems Tem- 
plated Bead Preparation Guide. After amplification, 
emulsions were disrupted with 2-butanol and the beads 
containing clonally amplified template DNA were P2- 
enriched and extended with a bead linker by terminal 
transferase. The quantity of the beads was determined 
using a NanoDrop® ND1000 spectrophotometer 
(Thermo Scientific, Wilmington, DE). Approximately 
600-700M beads were deposited on each slide (ran in 
total three slides) and sequenced using 'sequencing by 
ligation' chemistry and the 50x5 bp protocol on the 
SOLiD v4 sequencer (Applied Biosystems) at the 
Interdisciplinary Center for Biotechnology Research, 
University of Florida. Results were obtained as color 
space fasta files. 

Analysis of read data 

Raw sequencing reads were initially processed with Gen- 
omeQuest tools [12]. Ambiguous residues were trimmed 
off from both sides of the sequence. Bases with Phred 
quality below 12 from the 3' end of the sequence were 
removed. Reads that were shorter than 40 bases or that 
contained more than 10 bases with quality below 12 
were also discarded as were reads consisting of repetitive 
single bases that accounts for more than 60% of the 
length at the 3' end. About 53 ~ 64% of reads were 
retained after clean up, proving 102-157 million clean 
reads for the three replicates of each treatment. 

For mapping to the genome, the Bos taurus genomic 
sequence bosTau4 (repeat masked) was downloaded 
from the UCSC genome browser (http://genome.ucsc. 
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edu/). Sequencing reads of each sample were mapped in- 
dependently to the reference sequences using TopHat 
1.2.0 [13]. TopHat split reads to segments and joins seg- 
ment alignments. A maximum of one mismatch in each 
of the 25 bp segments was allowed. This step mapped 
36.8% reads to the genome. The unmapped reads were 
collected and mapped to the reference using Bowtie 
0.12.7 [14] allowing three mismatches. Unmapped reads 
were further mapped to cDNA sequences using bfast 
0.6.4 [15] while allowing for three mismatches for each 
read. The cDNA sequences of B. taurus were down- 
loaded from the National Center of Biotechnology Infor- 
mation. Scaffold and chromosome sequences were 
cleared and a total of 35,842 sequences were obtained 
(http://www.ncbi.nlm.nih.gov/nuccore/ ?term=txid9913 
[Organism:noexp]). Bfast aligned 27.6% of the total reads 
to the cDNA sequences. Therefore, a total of 64.4% or 
595 million reads were mapped successfully. Of the 
mapped reads, 89.8% are uniquely mapped to either the 
genome or cDNA sequences. Data were deposited in the 
DDBJ Sequence Read Archive at http://www.ddbj.nig.ac. 
jp/index-e.html (Submission DRA000504). 

Digital gene expression was determined as follows. 
The number of mapped reads for each individual gene 
was counted using the HTSeq tool (http://www-huber. 
embl.de/users/anders/HTSeq/doc/ overview.html) with 
intersection-nonempty mode. HTSeq takes two input 
files - bam or sam-format files of mapped reads and a 
gene model file. The Ensemble gene annotation file in 
GTF format was downloaded from the UCSC genome 
browser. The DESeq package [16] in R was used for 
digital gene expression analysis. DESeq uses the negative 
binomial distribution, with variance and mean linked by 
local regression, to model the null distribution of the 
count data. Significant up- and downregulated genes 
were selected using two cutoffs: an adjusted P value of 
0.05 and a minimum fold-change of 1.5. 

Classification of differentially expressed genes into gene 
ontology (GO) classes 

Differentially expressed genes were annotated by the 
Database for Annotation, Visualization and Integrated 
Discovery (DAVID; (DAVID Bioinformatics Resources 
6.7, http://david.abcc.ncifcrf.gov/) [17]. Most genes were 
annotated using the bovine genome as a reference and 
additional genes were annotated by comparison to the 
human genome. The DAVID database was queried to 
identify GO classes enriched for upregulated and down- 
regulated genes. Functions of differentially expressed 
genes were further annotated using Kyoto Encyclopedia 
of Genes and Genomes (KEGG, http://www.genome.jp/ 
kegg/). Overview of the differentially regulated KEGG 
pathways were mapped on KEGG Pathway Map using 
iPath2.0 (http://pathways.embl.de/) [18]. 



To further analyze patterns of genes differentially 
regulated between ICM and TE, k-mean clustering was 
performed. The reads count data of the 870 significant 
genes for the ICM-control versus TE-control compari- 
son were clustered using k-means strategy [19]. To esti- 
mate the premium cluster number, k-values from 3 to 
100 were tested and the corresponding sum of squares 
error (SSE) [20] was calculated for each k value. SSE is 
defined as the sum of the squared distance between each 
member of a cluster and its cluster centroid. The SSE 
values dropped abruptly until k = 8 (results not shown). 
To balance the minimum number of SSE and the mini- 
mum number of clusters, k = 8 was selected as the pre- 
mium parameter for clustering genes and a heatmap was 
generated using heatmap.2 of R package. 

Enrichment analysis for transcription factor binding sites 

For each differentially expressed gene, the candidate pro- 
moter region was defined as the span of nucleotides 
from 200 bp upstream and 50 bp downstream from the 
transcriptional start site identified in Ensembl. To detect 
putative transcription factor binding sites (TFBS) in each 
promoter, we followed the method of Wasserman and 
Sandelin [21]. Position-specific weight matrices were 
obtained from the JASPAR database [22]. The score was 
calculated by formula 1 in Additional File 1. We also cal- 
culated the ratio of the score to the maximum score by 
formula 2 (Additional File 1). Statistical significance of 
each TFBS was evaluated by calculating the hypergeo- 
metric distribution using formula 3 (Additional file 1). 
We performed the 'match' program with 'minSUM' and 
'minFP' thresholds to detect TFBS [23]. Statistical sig- 
nificance of each detected TFBS was evaluated by the 
hypergeometric distribution as described above. 

Calculation of GC contents and detection of CpG islands 

The method by Gardiner-Garden and Frommer [24] was 
used to identify CpG islands in the region encompassing 
the 100 nucleotides upstream and 100 nucleotides down- 
stream from the start site. Transcriptional start sites for dif- 
ferentially expressed genes were obtained from UMD3.1 
[25]. For the definition of CpG islands, The GC content 
was calculated as ([C]+[G])/200, where [N] denotes the 
number of nucleotides "N" within the 200 base window. 
The CpG score was calculated as [CG]/([C]*[G]*200). A 
gene was classified as CpG positive when its GC content in 
the region spanning the 100 nucleotides upstream and the 
100 nucleotides downstream from the start site exceeds 0.5 
and when the CpG score in the same region exceeds 0.6. 
Otherwise, a gene was classified as CpG negative. Chi- 
square analysis was used to determine whether the percent 
of genes classified as CpG positive differed between 1) 
genes overexpressed in ICM versus genes overexpressed in 



Ozawa et al. BMC Developmental Biology 2012, 12:33 
http://www.biomedcentral.com/1471 -21 3X/1 2/33 



Page 4 of 13 



TE and 2) genes overexpressed in ICM or TE versus the 
reference population of 25118 genes in the bovine genome. 

Confirmation of differences in gene expression between 
ICM and TE by quantitative PCR 

An experiment was performed to verify the effect of cell 
type (ICM vs TE) and CSF2 on relative mRNA abundance 
of the GAT A3, ELF5, CDX2, NANOG and SOX2. Embryos 
were prepared as described previously and blastocysts 
were collected at Day 7. Pools of 25-34 blastocysts were 
submitted to magnetic-activated cell sorting [9]. A total of 
6 biological replicates of ICM and TE were prepared. 
mRNA extraction was performed using the All Prep 
DNA/RNA mini Kit (Qiagen, Inc., Valencia, CA, USA) fol- 
lowed by DNase (Qiagen) treatment and reverse transcrip- 
tion (High Capacity cDNA Reverse Transcription Kit, 
Applied Biosystems, Foster City, CA). Transcript abun- 
dance for GATA3, ELF5, CDX2, NANOG and 50X2 as 
well as housekeeping genes GAPDH, SDHA and YWHAZ 
were quantified by a Bio-Rad thermal cycler CFX96-Real- 
Time system (Bio-Rad, Hercules, CA, USA) using SsoFast 
EvaGreen Supermix reagent (Bio-Rad, Hercules, CA, 
USA). PCR conditions were as follows: 30 sec at 95°C fol- 
lowed by 40 cycles each of 5 sec at 95°C and 1 min at 
60°C. Data were analyzed using the delta-delta cycle 
threshold (Ct) method. The reference gene was the 
geometric mean of the Ct values of GAPDH, SDHA 
and YWHAZ. Primers for ELFS were based on 
NM OO 10245 69.1 and were designed using PrimerQuest 
from idtDNA (http://www.idtdna.com) software, Effi- 
ciency was 95% and identity of amplicons was verified 
by sequencing products. The primers were 5' TGC 
CATTTCAACATCAGTGGCCTG 3' and 5' AAGGC 
C ACCCTC AAAG ACTATG CT 3'. Other primer pairs 
were published previously: GATA3 [26], CDX2 and 
NANOG [9], 50X2 [27] and GAPDH, SDHA and YWHAZ 
[28]. 

Data were analyzed by least-squares analysis of variance 
using the General Linear Model (GLM) procedure of the 
Statistical Analysis System, version 9.2 (SAS Institute Inc, 
Cary, NC, USA) Sources of variation in the model 
included cell type (ICM and TE), replicate and the inter- 
action; cell type was considered fixed and replicate was 
considered random. Logarithmic transformation was ap- 
plied to CDX2 data to improve normality. All data are 
reported as untransformed least-squares means. 

Results 

Differentially expressed genes 

The lists of differentially expressed genes, determined 
using an adjusted P value of <0.05 and > 1.5-fold differ- 
ence as cut-offs, are presented in Additional file 2. There 
were a total of 870 genes that were differentially expressed 
between ICM and TE, with 411 genes upregulated in the 



ICM and 459 downregulated in the ICM (i.e., upregulated 
in the TE). 

Annotation of genes differentially expressed between 
ICM and TE 

Differentially expressed genes were annotated using the 
Gene ID conversion tool of the DAVID Bioinformatics 
Resources 6.7 (http://david.abcc.ncifcrf.gov/conversion. 
jsp); 835 of the 870 differentially expressed genes were 
annotated (389 genes upregulated in the ICM and 424 
genes upregulated in the TE). For the list of genes up- 
regulated in ICM, 10 GO terms were listed in the Bio- 
logical Process group, 4 GO terms in the Cell 
Component group, and 5 terms in the Molecular Func- 
tion group (Table 1). Terms related to transcriptional ac- 
tivities were dominant including regulation of 
transcription, DNA-dependent (25 genes), regulation of 
transcription from RNA polymerase II promoter (11 
genes), DNA binding (29 genes), transcription regulator 
activity (22 genes) and transcription factor activity (17 
genes). There were also GO terms related to metabolic 
activity including regulation of RNA metabolic process 
(25 genes), positive regulation of macromolecule meta- 
bolic process (12 genes), negative regulation of macro- 
molecule metabolic process (10 genes), and enzyme 
binding (10 genes). 

For genes upregulated in TE, 12 GO terms were listed 
in the Biological Process group, 12 in the Cell Compo- 
nent group, and 9 in the Molecular Function group 
(Table 2). GO terms enriched for TE were distinct from 
those for ICM. A large number of genes represented by 
GO terms related with metabolism were upregulated in 
TE including proteolysis (27 genes), oxidation reduction 
(23 genes), lipid biosynthetic processing (11 genes), ster- 
oid metabolic process (10 genes), and peptidase activity 
(acting on L-amino acid peptides) (22 genes) as well as 
genes involved in binding reactions [ion binding (86 
genes), cation binding (83 genes), metal ion binding (81 
genes), calcium ion binding (34 genes) and iron ion 
binding (12 genes)]. There was also enrichment for 
genes associated with endo- or exocytosis, membrane 
transport and alterations in cellular architecture as indi- 
cated by GO terms for vesicle-mediated transport (15 
genes), actin filament-based process (14 genes), actin 
cytoskeleton organization (13 genes), cytoskeleton 
organization (13 genes), plasma membrane (43 genes), 
endoplasmic reticulum (32 genes), cytoplasmic vesicle 
(14 genes), vesicle (14 genes), actin cytoskeleton (13 
genes), cell projection (12 genes), vacuole (11 genes), 
endoplasmic reticulum part (11 genes), apical part of cell 
(10 genes), and cytoskeletal arrangement (20 genes). 

Functions of differentially expressed genes were further 
annotated using KEGG (http://www.genome.jp/kegg/). 
Genes upregulated in ICM were enriched in eight terms 
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Table 1 GO terms enriched for genes upregulated in the ICM as compared to TE a 



GO term 



Count 



Percent 



P value 



FDR 



Biological Process 

Regulation of transcription, DNA-dependent 
Regulation of RNA metabolic process 
Neurological system process 
Regulation of cell proliferation 
Immune response 

Positive regulation of macromolecule metabolic process 
Cognition 

Regulation of transcription from RNA polymerase II promoter 
Response to organic substance 

Negative regulation of macromolecule metabolic process 
Cell Component 

Plasma membrane 

Extracellular region 

Extracellular region part 

Extracellular space 
Molecular Function 

DNA binding 

Transcription regulator activity 
Calcium ion binding 
Transcription factor activity 
Enzyme binding 



25 
25 
12 
12 
12 
12 
11 
11 
10 
10 

34 
30 
19 
12 

29 
22 
18 
17 
10 



6.7 
6.7 
3.2 
3.2 
3.2 
3.2 
2.9 
2.9 
2.7 
2.7 

9.1 
8.0 
5.1 
3.2 

7.8 
5.9 
4.8 
4.6 
2.7 



0.04 
0.04 
0.01 
0.02 
0.03 
0.03 
0.00 
0.02 
0.01 
0.04 

0.02 
0.00 
0.00 
0.02 

0.05 
0.04 
0.03 
0.02 
0.01 



43.9 

49.9 

16.9 

24.9 

35.4 

43.1 

2.4 

29.6 

16.8 

43.9 

20.7 
4.2 
1.9 
16.9 

46.9 
40.5 
32.1 
19.4 
/./ 



a Only those GO terms which contained at least 10 differentially expressed genes are listed. 
b False discovery rate (x 100). 



(Table 3A). These included pathways involved in lineage 
commitment (e.g., hematopoietic cell lineage) and differen- 
tiation (axon guidance) as well as those involved in main- 
tenance of sternness and self renewal (e.g., pathway in 
cancer and Jak-STAT signaling pathway). Genes upregu- 
lated in TE were enriched in 12 terms (Table 3B). None of 
the terms were in common with KEGG terms enriched for 
genes upregulated for ICM. Terms were preferentially re- 
lated to transmembrane transport (lysosome, aldosterone- 
regulated sodium resabsorption, and ABC transporters), 
lipid or steroid metabolism (PPAR signaling pathway, 
terpenoid backbone biosynthesis, sphingolipid metabol- 
ism, steroid hormone biosynthesis, fatty acid metabol- 
ism) and other metabolic processes (pantothenate and 
CoA biosynthesis). Additional file 3 represents a KEGG 
metabolic pathway map in which pathways that were dif- 
ferentially enriched between ICM and TE were identified 
using iPath2.0 (http://pathways.embl.de/). Note the 
increased metabolic activity in TE as compared to ICM. 



K-mean clustering 

The 870 genes that were differentially expressed between 
ICM and TE were clustered into 8 clusters, with 2, 4, 7, 



9, 23,48, 149 and 628 genes in each cluster (Additional 
file 4). The biggest cluster (628 genes) contained 72.2% 
of all the significant genes and genes were included from 
almost all the overrepresented pathways (Table 3). 
Therefore, the k-mean analysis did not disclose much in- 
formation on functional expression patterns of differen- 
tially expressed genes. 



Comparison of ICM-TE differences in the bovine with the 
mouse and human 

The literature was used to identify a group of genes that 
have been identified as being expressed by ICM, TE or 
embryonic stem cells in the mouse [29-32] or human 
[33-38] (Additional file 5). Among the 119 genes consid- 
ered characteristic of ICM or embryonic stem cells, 8 
were significantly upregulated in ICM (KDM2B, 
NANOG, SOX2, SPIC, STAT3, ZX3HAV1, and OTX2) 
and two (IL6R and TFRC) tended (P=0.06 or less) to be 
upregulated in ICM. Conversely, 6 genes considered as 
being expressed in ICM or embryonic stem cells in the 
mouse or human were upregulated in the TE (DAB2, 
DSP, GM2A, SCD, SSFA2, and VAV3). Of 49 genes con- 
sidered characteristic of TE, 12 (AQP11, ATP1B3, CGN, 
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Table 2 GO terms enriched for genes upregulated in the TE as compared to ICM a 



GO term 



Count 



Percent 



P value 



FDR 



Biological Process 

Proteolysis 
Oxidation reduction 
Intracellular signaling cascade 
Ion transport 

Vesicle-mediated transport 
Regulation of cell proliferation 
Actin filament-based process 
Actin cytoskeleton organization 
Cytoskeleton organization 
Lipid biosynthetic process 
Steroid metabolic process 
Negative regulation of cell proliferation 
Cell Component 
Plasma membrane 
Endoplasmic reticulum 
Cell fraction 
Cytoplasmic vesicle 
Vesicle 

Actin cytoskeleton 
Membrane fraction 
Insoluble fraction 
Cell projection 
Vacuole 

Endoplasmic reticulum part 
Apical part of cell 
Molecular Function 

Ion binding 
Cation binding 
Metal ion binding 
Calcium ion binding 

Peptidase activity, acting on L-amino acid peptides 
Cytoskeletal protein binding 
Actin binding 
Iron ion binding 
Lipid binding 



27 
23 
20 
20 
15 
15 
14 
13 
13 
I I 
10 
10 

43 
32 
16 
14 
14 
13 
13 
13 
12 



83 
81 
34 
22 
20 
14 
12 
I I 



6.4 
5.4 
4.7 
4.7 
3.5 
3.5 
3.3 
3.1 
3.1 
2.6 
2.4 
2.4 

10.1 

7.6 

3.8 

3.3 

3.3 

3.1 

3.1 

3.1 

2.8 

2.6 

2.6 

2.4 

20.3 

19.6 

19.1 

8.0 

5.2 

4.7 

3.3 

2.8 

2.6 



0.00 
0.01 
0.03 
0.04 
0.00 
0.01 
0.00 
0.00 
0.00 
0.01 
0.00 
0.00 

0.04 
0.00 
0.00 
0.03 
0.04 
0.00 
0.01 
0.01 
0.04 
0.00 
0.00 
0.00 

0.00 
0.00 
0.00 
0.00 
0.00 
0.00 
0.00 
0.03 
0.03 



6.26 

10.40 

43.10 

50.64 

5.68 

11.09 

0.00 

0.00 

I. 22 
12.45 
0.31 
2.24 

40.40 

0.00 

2.50 

31.04 

36.80 

0.04 

II. 72 
15.08 
41.18 
0.87 
4.14 
0.04 

0.14 

0.49 

0.94 

0.00 

1.41 

0.00 

0.04 

31.90 

38.82 



1 Only those GO terms which contained at least 10 differentially expressed genes are listed. 
5 False discovery rate (x 100). 



CYP11A, DSC2, ELF5, GATA3, HSD3B1, KRT18, MSX2, 
SFXN or TJP2) were upregulated in TE. CDH24, a cad- 
herin reported to be upregulated in the TE of the human 
[33], was expressed in higher amounts in the ICM. 

We also examined expression of ruminant-specific genes 
known to be upregulated in TE. The three examined, IFNT1 
[39], PAG2 [40], and TKDP1 [41], were upregulated in TE. 



We evaluated differences in expression between ICM 
and TE for genes that have been shown in the mouse [7] 
to be important for segregation of ICM and TE lineages 
and subsequent TE differentiation (Table 4). Expression 
of two genes important for ICM commitment, NANOG 
and SOX2, was significantly higher for ICM than TE 
while expression of two other genes important for ICM 
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Table 3 KEGG Pathways enriched for genes upregulated in the inner cell mass or trophectoderm 



Term 



Genes 



Antigen processing and presentation 
Complement and coagulation cascades 
Chemokine signaling pathway 
Axon guidance 

Arrhythmogenic right ventricular cardiomyopathy (ARVC) 
Pathways in cancer 
Jak-STAT signaling pathway 
Hematopoietic cell lineage 

Lysosome 

Steroid biosynthesis 

Aldosterone-regulated sodium reabsorption 
Vascular smooth muscle contraction 
PPAR signaling pathway 
Phosphatidylinositol signaling system 
Pantothenate and CoA biosynthesis 
Terpenoid backbone biosynthesis 
Sphingolipid metabolism 
Steroid hormone biosynthesis 
Fatty acid metabolism 
ABC transporters 



Upregulated in Inner Cell Mass (A) 

CD74, CD8B, HSPAIL, HSPA6, PSMEI, BoLA-DRB3 
A2M, F2R, OR, PLAUR, C4BPA, 

ITK, CCL24, CXCL7, GNAI1, GNB5, QNG7, PLCBI, STAT1, STAT4, 5TAT3 
EPHA4, CHP, DPY5L2, GNAI1, R0B01, SEMA4G, SUT2 
CDH2, DES, GJA1, ITGA2, TCF7L2 

CDKN2B, FGF12, FGF16, FFGA2 MMP9, PDGFRA, STATI, STAT4, STAT3, FCF7L2, F05, KIT.WNT 

IL12RB2, 1119, IL6ST, 17, STA1, STAT4, STAT3, SPRY2 

GDI A, CD8B, ITGA2, IL7, KIT 

Upregulated in Trophectoderm (B) 

ATP6V0A4, GM2A, NPC, CT5B, CT5H, CTSL2, CTNS, GLAA, GALG, MANBA, 
PLA2G15, SGARB2, ATP6V0G, SLC1 1A2 

NSDHL, CYP41AI, FDFT1, 5G4MOL 

ATP1B3, NEDD4L, PRKCG, SGKI, SFN 

AGTA2, ACTG2, CALDI, GALML5, ITPR2, MYLK, MY 16, PRKGH, PRKCG 

ACSL4, AXSL6, FABP5, AC5L3, SCO, SCP2 

CALML3, ITPR2INPP4B, INPP5D, PRKCG, 5YNJ1 

BCAT1, ENPP1, ENPP3 

HMGCR, ACAF2, IDI 1 

UGCG, GLA, GALC, SGPP1 

UGT1A1, UGT1A6, CYP1 1AI, CYP3A28, HSD3B1 

ACAT2, ACSL4, ACSL6, AC5L3 

ABCA3ABCB1 , ABCC2, ABCG5 



commitment, P0U5F1 and SALL4, did not differ signifi- 
cantly between ICM and TE. Numerically, expression of 
these latter two genes was higher for ICM. Four genes 
were examined that are important for TE commitment 
- CDX2, GATA3, TEAD4, and YAP1. Expression of 
GATA3 was significantly higher for TE but there were 
no significant differences in expression between ICM 
and TE for the other three genes. One gene important 
for differentiation of TE later in development, ELF5, was 



expressed in higher amounts in TE (adjusted P=0.022) 
whereas another, EOMES, was barely detectable and not 
different between ICM and TE. 

Characteristics of promoter regions of genes differentially 
expressed between ICM and TE 

The region spanning nucleotide sequences located 200 bp 
upstream to 50 bp downstream of the transcription start 
site was examined for presence of putative TFBS for each 



Table 4 Differences in expression between ICM and TE for genes involved in segregation of ICM and TE in mice 3 



Gene symbol 


Role in mouse 


Mean counts, ICM 


Mean count, TE 


Fold change, TE/ICM 


Adjusted P value 


GDX2 


TE commitment 


5.7 


2.8 


0.49 


0.780 


ELF5 


TE differentiation 


5.3 


28.9 


5.41 


0.022 


GATA3 


TE commitment 


363.6 


976.7 


2.69 


0.018 


EOMES 


TE differentiation 


1.4 


0.2 


0.16 


0.934 


NAN0G 


ICM commitment 


3014.8 


620.9 


0.21 


0.000 


POU5F1 


ICM commitment 


2394.1 


1873.5 


0.78 


0.605 


SALL4 


ICM commitment 


5.3 


3.8 


0.71 


0.893 


50X2 


ICM commitment 


816.2 


360.7 


0.44 


0.005 


TEAD4 


TE commitment 


7.1 


12.0 


1.69 


0.894 


YAP! 


TE commitment 


47.9 


43.0 


0.90 


1.000 


a Source: Chen et al. 


[7]. 











Ozawa et al. BMC Developmental Biology 2012, 12:33 
http://www.biomedcentral.com/1471 -21 3X/1 2/33 



Page 8 of 13 



gene that was differentially expressed between ICM and 
TE. Binding sites for three transcription factors (PLAG1, 
RELA and RREB1) were significantly enriched for genes 
overexpressed in the ICM while binding sites for nine tran- 
scription factors (EGR1, GABPA, KLF4, MYF5, SP1, MZF1, 
NHLH1, PAX5 and ZFX) were significantly enriched for 
TE. For 11 of 12 transcription factors identified as being 
used to regulate genes overexpressed in ICM or TE, there 
was no difference in expression level between ICM and TE. 
The exception was for EGR1, where expression was upre- 
gulated in ICM (Additional file 2), even though the TFBS 
was enriched for genes overexpressed in TE. 

Differences in promoter CpG islands between genes 
overexpressed in ICM or TE 

The percent of genes overexpressed in ICM that were 
classified as CpG positive (46.6%) was lower (P<0.05) 
than for genes overexpressed in TE (55.3%). Moreover, 
the percent of genes classified as CpG positive for genes 
overexpressed in either tissue was higher than the per- 
cent that were classified as CpG positive for the entire 
bovine genome (39.4%). Thus, DNA methylation may 
play a greater role for regulation of genes differentially 
regulated in the ICM and TE than it does for the gen- 
ome as a whole. 

Of the genes that were differentially regulated for ICM 
and TE, three were genes involved in epigenetic modifi- 
cation. These were DNMT1 and KDM2B, overexpressed 
in ICM, and DNMT3A like sequence, overexpressed in 
TE (Additional file 2). 

Confirmation of differences in gene expression between 
ICM and TE by quantitative PCR 

Using isolated ICM and TE from a separate set of blas- 
tocysts than used for SOLiD sequencing, qPCR was per- 
formed to verify treatment effects on gene expression 
for 6 genes (GATA3, ELF5, CDX2, NANOG and SOX2). 
Results for differences between ICM and TE were gene- 
rally consistent with results from deep sequencing 
(Figure 1). In particular, expression was higher for TE 
than ICM for GATA3 (P=0.07) and ELFS (P<0.05) and 
was higher for ICM than TE for NANOG (P<0.05) and 
S0X2 (P<0.05). One discrepancy with deep sequencing 
results was for CDX2. While there was no significant dif- 
ference between ICM and TE in the deep sequencing 
data base (Table 4), mRNA for CDX2 was higher for TE 
than ICM as determined by qPCR (Figure 1). 

Discussion 

Differentiation in the mammalian embryo is dependent 
upon spatial position - cells on the inside of the embryo re- 
main pluripotent for a period until initiation of gastrulation 
while cells on the outer face of the embryo differentiate into 



TE and ultimately form much of the extraembryonic 
membranes. Here, using magnetic-assisted cell sorting and 
high-throughput next generation sequencing, we show the 
consequences of spatial differences between ICM and TE 
and subsequent divergence in lineage commitment for ex- 
pression of genes regulating pluripotency and lineage 
commitment, cellular metabolism, and interactions with 
the maternal system. 

Commitment towards the ICM lineage in the mouse is 
maintained by actions of Pou5fl (Oct4), Sall4, Sox2 and 
Nanog; Cdx2 in the TE inhibits PouSfl expression and 
allows differentiation of extraembryonic membranes 
[3,4,7]. In the bovine, too, SOX2 and NANOG were over- 
expressed in ICM but expression of POU5F1 and SALL4 
were not significantly different between ICM and TE. A 
high degree of expression of POU5F1 in the TE was 
expected because differences in the regulatory region of 
the POUSF1 gene in cattle as compared to the mouse gene 
make P0USF1 resistant to regulation by CDX2 [6]. None- 
theless, POUSF1 expression is greater in the ICM of cattle 
[6,42]. In the present study, expression of both P0U5F1 
and SALL4 were numerically greater for ICM; failure to 
find significant differences between ICM and TE may rep- 
resent the small sample size. It should also be kept in 
mind that embryos produced in vitro have altered patterns 
of gene expression relative to embryos produced in vivo 
[43]. Such alterations could change some of the differen- 
tial gene expression between ICM and TE, as has been 
reported for the mouse embryo [44] . 

Analysis of genes upregulated in ICM provides some 
clues as to the signaling pathways required for specifica- 
tion, pluripotency, and other functions of the ICM. A 
total of 8 genes in the KEGG Jak-STAT signaling pathway 
were upregulated. In mice, LIF, which signals through the 
Jak-STAT pathway, can promote pluripotency of cells 




GATA3 ELF5 CDX2 NANOG S0X2 

Target genes 



Figure 1 Differences between inner cell mass (ICM) and 
trophectoderm (TE) in expression of 6 select genes as 
determined by quantitative PCR. Blastocysts were harvested at 
Day 7 and ICM and TE separated by magnetic activated cell sorting. 
Data represent least-squares means ± SEM of results from six 
biological replicates. Open bars represent ICM and filled bars TE. 
*=P<0.05. 
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derived from the ICM [45]. While LIF cannot cause bo- 
vine ICM cells to develop into stem cells [46], other 
molecules that signal through the Jak-STAT pathway are 
likely to be involved in regulation of the ICM. Several 
genes related to cellular migration were upregulated in 
ICM, as indicated by enrichment of the chemokine sig- 
naling pathway (10 genes) and axon guidance (7 genes) 
GO terms. In the mouse, blastomeres of the ICM can 
change position, at least in part to align position with 
subsequent formation of primitive endoderm [47-49]. 
Perhaps, movement is directed by guidance molecules 
such as chemokines. 

Outer cells of the mouse blastocyst are committed to- 
wards the TE lineage through the actions of Yapl, Tead4, 
Gata3, and Cdx2 ([3,4,7]. We found no difference in CDX2 
expression between ICM and TE using deep sequencing 
even though it is well established that the gene is expressed 
to a greater extent in TE of the bovine [6,9,42] and CDX2 
expression was higher in TE than ICM in the qPCR experi- 
ment. CDX2 expression was very low in the deep sequen- 
cing experiment, especially compared to that of POU5F1. 
One possibility is that differences in CDX2 expression be- 
tween TE and ICM at Day 7 (as detected by qPCR) become 
reduced at Day 8. Like seen earlier [6], other homologues of 
CDX2 were not detected (CDX1) or were nearly non- 
detectable (CDX4) (Additional file 2). 

Another gene involved in TE lineage, GATA3, was 
expressed in higher amounts in TE. A similar but non- 
significant difference in expression between ICM and TE 
was noted earlier [42]. There was no significant differ- 
ence in TEAD4 or YAP1 expression between ICM and 
TE. Similar findings were observed in the bovine for 
TEAD4 [42]. A gene involved in development of extra- 
embryonic ectoderm in mice, ELF5 [7], was overex- 
pressed in TE whereas another gene involved in 
development of extraembryonic membranes, EOMES, 
was barely detectable. In fact, there appears to be an ab- 
sence or very low expression of EOMES in TE between 
day 7 and 15 of gestation in cattle [6]. In addition, by 
Day 11 of gestation, trophoblast expression of ELF5 is 
inhibited and becomes limited to the epiblast [50]. 

It is notable that several genes characteristically expressed 
in ICM of mouse or human, DAB2, DSP, GM2A, SCD, 
SSFA2, and VAV3, [30,32,37] were significandy overex- 
pressed in the TE of the bovine while CDH24, reported to 
be upregulated in the TE of the human [33], was expressed 
in higher amounts in the ICM of the bovine. Dsp and Dab2 
are indispensible for embryonic development in mice and 
homologous recombination causes postimplantation em- 
bryonic failure [51,52]. Clearly, as first shown by Berg et al. 
[6], divergent evolution in the control of early embryonic 
development means that study across a wide array of spe- 
cies is required to understand developmental processes 
fully. 



By virtue of its position in the embryo, polarized morph- 
ology [53] and tight junctions between its member cells 
[1], the TE is fated to be the cell lineage through which 
the blastocyst interacts directly with the mother in terms 
of nutrient exchange, maternal-conceptus communication, 
and placentation. It appears that executing these functions 
places increased metabolic demands on the TE as com- 
pared to the ICM as indicated by upregulation of genes 
involved in metabolism, particularly those involved in lipid 
metabolism. Lipid accumulation in cultured bovine 
embryos is greater for TE than ICM, although the differ- 
ence depends upon medium [54,55]. 

It is through the TE that nutrients enter the embryo 
and from the TE that secretory products of the embryo 
must enter the uterine environment. Consistent with a 
role for the TE in uptake and delivery was upregulation 
of genes involved in endo- or exocytosis and membrane 
transport. Lysosomal-like structures have been reported 
to be more abundant in TE than ICM in cattle, at least 
for certain media [54,55], and the mouse [53]. 

Molecules involved in signaling to the mother that 
were upregulated in TE include IFNT1, PAG2 and 
TKDP1. The role for IFNT1 is to act on the maternal 
endometrium to block luteolytic release of prostaglandin 
F2a [39,56]. While this action is initiated later in preg- 
nancy, between Day 15 and 17 of gestation, secretion of 
IFNT occurs as early as the blastocyst stage [57]. TKDP1 
is a member of the Kunitz family of serine proteinase 
inhibitors and may function to limit trophoblast inva- 
siveness in species like the cow with epitheliochorial pla- 
centation [41]. Little is known about the role of PAG2, 
which is the mostly abundantly expressed of at least 22 
transcribed PAG genes [40]. Unlike some PAG genes 
(the so-called "modern" clade), whose expression is lim- 
ited to trophoblast giant cells formed later in develop- 
ment, PAG2 is expressed widely in the cotyledonary 
trophoblast and is predicted to be an active aspartic pro- 
teinase [58]. 

IFNT1, PAG2 and TKDP1 are all genes that are 
phylogenetically-restricted to ruminants. Another con- 
ceptus product that is produced more widely in mam- 
mals is estrogen. The role for embryonic estrogen is not 
known for most species but blastocyst estrogen has been 
suggested to be involved in hatching from the zona pel- 
lucida in hamsters [59] and in conceptus growth in the 
pig [60]. The bovine blastocyst, too, produces estrogen 
[61] and the upregulation of genes involved in terpenoid 
backbone biosynthesis and steroid hormone biosynthesis 
suggest that the primary source of blastocyst estrogens 
is the TE. 

Following blastocyst formation, the ruminant tropho- 
blast undergoes a series of developmental steps that are 
dependent on changes in cell shape and spatial position, 
including hatching (which requires actin-based 
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trophectodermal projections [59]), elongation (which 
leads to an increase in size of the conceptus from about 
0.16 mm at Day 8 to as much as 100 mm or more at 
Day 16 [62]) and eventual attachment to the maternal 
endometrium (commencing around Day 20 in the cow 
[63]. The upregulation of genes in the trophoblast for 
ontologies such as actin filament-based process, actin 
cytoskeleton organization, cell projection and cytoskel- 
etal arrangement reflects the extensive changes in cell 
architecture required for these processes. In addition, 
three cathepsin genes, CTSB, CTSH and CTSL2, were 
upregulated in TE; these proteinases have been impli- 
cated in blastocyst hatching [59,64]. 

Differences in gene expression between ICM and TE 
are probably due in large part to differences in transcrip- 
tion factor usage and to epigenetic modifications. Bind- 
ing sites for the transcription factors PLAG1, RELA and 
RREB1 were enriched for genes overexpressed in ICM 
while binding sites for nine transcription factors (EGR1, 
GABPA, KLF4, MYF, SP1, MZF1, NHLH1, PAX5 and 
ZFX) were significantly enriched for TE. RELA is a sub- 
unit for NFkB, which in turn has been implicated in dif- 
ferentiation of trophoblast lineages from embryonic 
stem cells [65] and in function of trophoblast giant cells 
[66]. Several of the transcription factors associated with 
genes upregulated in TE are involved in hematopoiesis, 
including EGR1 [67], GABPA [68], MZF1 [69], and ZFX 
[70]. One of these transcriptional factors, GABPA, can 
enhance Pou5fl expression in mouse embryonic stem 
cells [71] and another, KLF4, is a key regulator of main- 
tenance and induction of pluripotency [72]. The overall 
picture is one where hematopoiesis and sternness is 
under positive regulation in the TE. Another transcrip- 
tion factor associated with regulation of genes upregu- 
lated in TE was SP1. This protein exerts several actions 
to regulate trophoblast development and function, in- 
cluding activation of expression of other transcription 
factors such as Tfap2c [73] and Idl [74]. In the cow, SP1 
becomes limited to binucleate cells of the trophoblast by 
Day 25 [75]. 

DNA methylation could be important for regulation of 
gene expression in the blastocyst because the promoter 
regions of over half of the genes that were upregulated in 
ICM or TE were classified as CpG positive. Indeed, the 
percent of genes classified as CpG positive for genes over- 
expressed in ICM or TE was higher than the percent that 
were classified as CpG positive for the entire bovine gen- 
ome. Slightly fewer genes that were overexpressed in ICM 
were classified as CpG -positive than for genes that were 
overexpressed in TE, which might suggest more inhibition 
of gene expression by methylation in TE. It is noteworthy, 
however, that Niemann et al. [76] did not find a correl- 
ation between degree of CpG island methylation and 
amount of embryonic expression for eight genes 



examined. Recent evidence has been interpreted to signify 
that it is not the methylation state of individual CpG that 
determine gene expression but rather the methylation sta- 
tus of large regions of DNA that span multiple genes [77]. 

In cattle, there are conflicting data as to whether 
DNA methylation is less extensive for ICM or for TE 
in both embryos produced in vitro and by somatic cell 
nuclear transfer [78-80], Another epigenetic mark, 
H3K27me3, is similar for both cell types [81]. Of the 
genes that were differentially regulated for ICM and 
TE, three were genes involved in epigenetic modifica- 
tion. Two were overexpressed in ICM: DNMT1, involved 
in maintenance of DNA methylation during succeeding 
cell divisions [77], and KDM2B, a lysine-specific histone 
dimethylase which catalyzes demethylation of H3K4 and 
H3K6 [82,83]. In contrast, a DNMT3A like sequence, 
which establishes DNA methylation during development 
and also participates in methylation maintenance [77], 
was overexpressed in TE. The presence of increased tran- 
script abundance for DNMT3A could be interpreted to 
mean that de novo DNA methylation occurs to a greater 
degree in TE, as is indicated by studies with embryos 
produced in vitro [79] and by somatic cell nuclear clon- 
ing [80]. Further research is necessary to determine dif- 
ferences in DNA methylation between TE and ICM at 
the gene-specific and genome-wide level. 

In general, analysis of a separate set of isolated ICM 
and TE by qPCR confirmed the results obtained for dif- 
ferences between cell types by deep sequencing. The ex- 
ception was for CDX2, where there was no difference in 
expression as determined by SOLiD sequencing but 
where expression was greater for TE than ICM as deter- 
mined by qPCR. The discrepancy could reflect either 
day of sampling differences (as discussed earlier) or, 
given the often-repeated observation that CDX2 is 
expressed to a greater extent in TE than ICM [6,9,42], 
an error induced by the deep sequencing procedure. 

In conclusion, differentiation of blastomeres of the 
morula-stage embryo into the ICM and TE of the blasto- 
cyst is accompanied by differences between the two cell 
lineages in expression of genes controlling metabolic 
processes, endocytosis, hatching from the zona pellu- 
cida, paracrine and endocrine signaling with the mother, 
and genes supporting the changes in cellular architec- 
ture, sternness, and hematopoiesis necessary for develop- 
ment of the trophoblast. Much of the process leading to 
this first differentiation event seems to be under the 
control of genes such as NANOG and GATA3 that play 
central role in lineage commitment in the mouse. As 
found by others also [6,42], there are fundamental differ- 
ences from the mouse. Understanding the nature of the 
process of preimplantation development in mammals 
will necessarily require a comparative approach based 
on study of a variety of animal models. 
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Conclusions 

Analysis of gene expression indicated that differentiation 
of blastomeres of the morula-stage embryo into the ICM 
and TE of the blastocyst is accompanied by differences 
between the two cell lineages in expression of genes con- 
trolling metabolic processes, endocytosis, hatching from 
the zona pellucida, paracrine and endocrine signaling 
with the mother, and genes supporting the changes in 
cellular architecture, sternness, and hematopoiesis neces- 
sary for development of the trophoblast. 
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Additional file 1: Formulas used for enrichment analysis for 
transcription factor binding sites. 

Additional file 2: Differences in gene expression between ICM and 

TE. Genes in which the adjusted P value was <0.05 are color coded (blue 
are upregulated in ICM and red are upregulated in TE). 

Additional file 3: KEGG metabolic pathway map in which pathways 
that were differentially enriched between ICM (blue) and TE (red) 
were identified using iPath2.0. 

Additional file 4: Heatmap constructed by k-mean clustering of the 
870 genes that differ in expression between ICM and TE. The colors 
in the map display the relative standing of the reads count data; blue 
indicates a count value that is lower than the mean value of the row 
while red indicates higher than the mean. The shades of the color 
indicate how far away the data from the mean value of the row. 
Columns represent individual samples of ICM (IC) and TE CYC). 

Additional file 5: Differences in expression between inner cell mass 
(ICM) and trophectoderm (TE) for genes considered as being 
characteristically expressed by ICM and TE in human or mouse. 
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